fix: reverse SOC/CAC detection order in POST /chunks handler#5445
fix: reverse SOC/CAC detection order in POST /chunks handler#5445significance wants to merge 6 commits intoethersphere:masterfrom
Conversation
SOC chunks uploaded via the generic /chunks endpoint are misclassified as CAC chunks because the handler tries CAC parsing first, which always succeeds for valid SOC data (8-4104 bytes). These tests demonstrate that SOC uploads return incorrect addresses and are unretrievable.
Try SOC parsing before CAC in the chunk upload handler. CAC parsing accepts any data between 8-4104 bytes, so valid SOC data was always misclassified as CAC and stored at the wrong content-addressed hash instead of the correct SOC address (Hash(Identifier || Owner)).
|
@significance thanks for this. a few comments:
|
I think this is highly needed and it would be good to have a generic chunk upload endpoint. In my opinion what would make the most sense is to have a This PR introduces a potential performance penalty when using only content-addressed chunks, because the signature would need to be validated even if the chunk is content-addressed. This could be mitigated by having a See related issues/mentions: |
I think that having a header would make sense, then we can have a unified interface and get rid of the extra endpoints. The question is whether we want to have opinionated soc validation like today in the current soc upload endpoint (for me this is a bit too much. not sure it needs to go that deep). |
|
yes, must admit i did not consider the performance implication there, as i was in app dev mode and just scratching an itch/removing barriers : ) from bee pov header sgtm, thank you both 🙇 |
acfa90b to
489b34f
Compare
Allow callers to explicitly specify chunk type ('soc'/'1' or 'cac'/'0')
via the Swarm-Chunk-Type header, skipping auto-detection and avoiding
unnecessary SOC signature validation for CAC uploads.
|
and immediately running into the problem again because CAC ⊄ SOC, SOC ⊂ CAC
there is no way to mitigate this without the chunk types |
that SWIP is more about how we pass chunks around in the protocols. while it doesn't necessarily touch upon the point of API contracts (but rather wire protocol contracts), at least in theory - there's nothing preventing us from doing it in this way. the only thing to keep in mind is that once it leaves the wire protocol domain and enters the API, both upload and download must use the new protobuf type. |
Checklist
Description
The
POST /chunkshandler tries CAC (Content Addressed Chunk) parsing before SOC (Single Owner Chunk) parsing. Since CAC parsing accepts any data between 8-4104 bytes, valid SOC data is always misclassified as CAC and stored at the wrong content-addressed hash instead of the correct SOC address (Hash(Identifier || Owner)).This breaks client-side SOC construction workflows where pre-stamped SOCs are uploaded via
POST /chunks— the chunk is stored but can never be retrieved at its expected SOC address.The fix reverses the detection order: try SOC parsing first, fall back to CAC.
Open API Spec Version Changes (if applicable)
N/A — no API contract change, only fixes incorrect internal routing of chunk types.
Motivation and Context
Client-side SOC construction (e.g. using
bee-jsmakeContentAddressedChunk().toSingleOwnerChunk()thenuploadChunk()) relies onPOST /chunkscorrectly identifying and storing SOCs at their SOC address. Without this fix, all SOCs uploaded via this endpoint are silently stored as CACs at the wrong address.Related Issue
N/A
Screenshots (if appropriate):
N/A
AI Disclosure
AI Canary 🦜
Updated diff not yet reviewed by the author